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Abstract 

A streaming model is one where data items arrive over long period of time, either one item at a 
time or in bursts. Typical tasks include computing various statistics over a sliding window of some 
fixed time horizon. What makes the streaming model interesting is that as the time progresses, old 
items expire and new ones arrive. One of the simplest and most central tasks in this model is sampling. 
That is, the task of maintaining up to k uniformly distributed items from a current time-window. We call 
sampling algorithms succinct if they use provably optimal (up to constant factors) worst-case memory to 
maintain k items (either with or without replacement). We stress that in many applications, structures that 
have expected succinct representation as the time progresses are not sufficient. That is, expected small 
memory solutions in a streaming environment will never provide a fixed bounded memory guarantee over 
the lifetime of a (very large) stream, as small probability events eventually happen with probability 1 . 
Thus, in this paper we ask the following question: Are succinct sampling on streams (or S 3 algorithms) 
possible, and if so, for what models? 

Perhaps somewhat surprisingly, we show that S 3 algorithms (i.e. with matching upper and lower 
bounds and worst case fixed memory guarantees) are possible for all variants of the problem mentioned 
above, i.e., both with and without replacement and both for one-at-a-time and bursty arrival models. In 
addition to fixed memory guarantees, our solution has additional benefits that are important in appli- 
cations: in "one-item-at-a-time" model, the samples produced over non-overlapping windows are com- 
pletely independent of each other (this was not the case for previous solutions) and in the bursty model, 
previous solutions required floating point computations; we do not. Finally, we use S 3 algorithms to 
solve various problems in the sliding windows model, including frequency moments, counting triangles, 
entropy and density estimations. For these problems we present /zrsf solutions with provable worst-case 
memory guarantees. The results that we arrive at are based on the novel sampling method that could be 
of independent interest. 



1 Introduction 



A data stream is an ordered, possibly infinite, set of elements, p\,P2, ■ ■ ■ ,PN: ■ ■ ■ » that can be observed 
only once. The data stream model recently became extremely useful for numerous applications including 
networking, finance, security, telecommunications, world wide web, and sensor monitoring. Since data 
streams are unbounded, it is impossible to store all data and analyze it off-line using multiple passes (in 
contrast to traditional database systems). As a result, precise calculation of some queries or statistics may 
be infeasible, and approximate solutions are provided. One of the main challenges is to minimize memory 
requirements, while keeping a desirably precise answer. 

Many applications are interested in analyzing only recent data instead of all previously seen elements. 
The sliding window model, introduced by Babcock, Babu, Datar, Motwani and Widom Q, reflects this 
interest. In this model we separate past elements into two sets. The most recent elements represent a 
window of active elements, whereas others are expired. An active element may eventually become expired, 
but expired elements stay in this status forever. The sliding window is a set of all currently active elements, 
i.e., W = {pN-n, ■ ■ ■ ,Pn}, where N is the current size of a stream and n is the number of active elements 
frequently refereed to as window's size. Only active elements are relevant for statistics or queries. For 
the sequence-based model, the window size is predefined and does not depend on the current status of the 
stream. For the timestamp-based model, each element p is associated with a non-decreasing timestamp, 
T(p). An element is active if t — T(p) < to, where t is the current timestamp and to is some predefined 
and fixed value. Thus, window's size strictly depends on t and can be any non-negative number. We refer 
readers to the works of Babcock, Babu, Datar, Motwani and Widom [7 ], Muthukrishnan [49] and Aggarwal 
[1] for more detailed discussions of these models, and related problems and algorithms. 

1.1 Questions posed and our results 

Random sampling methods are widely used in data stream processing, because of their simplicity and ef- 
ficiency. What makes these methods attractive for many applications is that they store elements instead of 
synopses, allowing us to change queries in an ad hoc manner and reuse samples a posteriori, with different 
algorithms. Further, sampling methods are natural for streams with multi-dimensional elements, while other 
methods, such as sketches, wavelets and histograms, are not easily extended to multi-dimensional cases. We 
refer readers to the recent surveys by Datar and Motwani 0]] (Chapter 9) and Muthukrishnan Il49l for deeper 
discussions of these and other advantages of sampling methods. What makes sampling non-trivial is that 
the domain's size changes constantly, as well as the probabilities associated with elements. We distinguish 
between sampling with replacement where all samples are independent (and thus can be repeated), and its 
generalization, sampling without replacement, where repetitions are prohibited. Due to its fundamental na- 
ture, the problem has received considerable attention in the last decades. Vitter ll54l presented reservoir 
sampling, probably the first algorithm for uniform sampling (with and without replacement) over streams. 
A reservoir is an array with size k where the current samples are stored. We choose pi to be a sample w.p. 
1, if < i < k and w.p. A otherwise. If pi is chosen and there is no space in the reservoir, we delete 
one of the previously chosen samples and put pi instead. This algorithm requires Q(k) memory and gener- 
ates uniform random sample without replacement of size k. Numerous sampling methods were developed 
for different scenarios and distributions. These works include, among many others, concise and counting 
samplings by Gibbons and Matias 1361 : priority sampling by Duffield, Lund and Thorup (H, Alon, Duffield, 
Lund and Thorup [28], and Szegedy |[53Tl : weighted sampling by Chaudhuri, Motwani and Narasayya lfl9lk 
faster reservoir sampling by Li IPF71 ; density sampling by Palmer and Faloutsos ||50l ; and non-uniform reser- 
voir sampling by Kolonko and Wasch II421 . Several data stream models (including sliding windows) allow 
deletions of stale data. In these models sampling becomes even more challenging, since samples eventually 
expire and must be replaced. The recent works include chain and priority samplings by Babcock, Datar and 
Motwani [8[; biased sampling by Aggarwal [2]; Aggarwal, Han, Wang, and Yu O; sampling in dynamic 
streams by Frahling, Indyk, Sohler ||33l ; and inverse sampling by Cormode, Muthukrishnan and Rozenbaum 
(221. 
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1.1.1 Historic perspective on sampling on sliding windows 

In this paper we address the problem of maintaining a random sample of fixed size k for every window. This 
problem was introduced in the pioneering paper of Babcock, Datar and Motwani [8 ] and several solutions are 
known. One possible and well-known method, described in this paper, is periodic or systematic sampling. 
A sample pi is picked from the first n elements, defining the sequence of all its replacements as pi+ sn: s = 
1,2, ... . This method provides a deterministic solution for the sampling problem and uses 0(k) memory. 
However, it was criticized for its inability to deal with periodic data and vulnerability to malicious behavior. 
For more detailed criticism of periodic sampling see, for example, the papers of Duffield [26 ] and Paxson, 
Almes, Mahdavi and Mathis [51]]. Therefore, the following requirement is important: samples from distinct 
windows should have either weak or no dependency. 

Babcock, Datar and Motwani @ provided the first effective algorithms that do not possess periodic 
behavior. The chain algorithm provides samples from sequence-based windows. The algorithm picks every 
new element w.p. -. In addition, for every chosen element, it uniformly selects and stores its replacement 
from its n successors, creating a chain of replacements. If a sample expires, its successor in the chain 
becomes the new sample. The expected size of this chain is constant and 0(log(n)) with probability 1 — 
for a constant c. Repeating this k times gives sampling an expected optimal memory O(k) and a high- 
probability upper bound 0(klogn). Priority sampling provides samples from timestamp-based windows. 
The algorithm associates each element with a priority, a real number randomly chosen from [0,1]. An 
element is chosen as a sample if its priority is highest among all active elements. The algorithm stores 
only elements that may potentially become samples. The expected number of such elements is O(logn) 
and is the same with probability 1 — \ for a constant c. Therefore, the expected and high-probability 
memory is O(klogn). The authors also note that sampling without replacement can be simulated, with 
high probability, by over-sampling with replacement. 

Recently, Zhang, Li, Yu, Wang and Jiang [55 ] provided sampling algorithms for sliding windows. These 
algorithms use linear memory, work only for small windows, and therefore cannot be compared to our 
results. 

1.1.2 Our contribution 

In this paper we optimally solve the problem of sampling on sliding windows. We refine the requirements 
above, defining two critical properties of sampling algorithms for sliding windows. First, they must use 
provably optimal memory. Second, we require complete independence for non-overlapping windows, refin- 
ing the ideas from [8[. By Succinct Sampling on Streams (or S 3 ) we denote an algorithm that satisfies the 
above requirements. In this paper we ask the following question: Are S 3 algorithms possible, and if so, for 
what models? We show that S 3 algorithms for uniform distribution are possible for all variants of the prob- 
lem, i.e., both with and without replacement and both for sequence and timestamp-based windows. That is, 
for all these models we present a matching upper and lower bounds. For sequence-based windows, we use 
@(k) memory to generate a sample with size k, both with and without replacement. For timestamp-based 
windows, we present Q(k log n) algorithms, where n is the number of non-expired elements, and prove that 
this is an optimal solution. 

From the theoretical perspective, S 3 algorithms are important since their improvements over the chain 
and priority methods can be arbitrarily large, in the worst case. That is, for any m = o(n) there exists a data 
stream D such that the ratio between maximal memory over all windows for the previous solutions and S 3 
algorithms is at least m. Consider, for instance, chain sampling where a chain of replacement is maintained. 
The probability that each of k chains in (H has length m (or larger) is at least 1 / m km . Therefore, for streams 
with size N > m mk , the expected maximal memory usage over all windows is at least £l(mk). In contrast, 
our method requires 0(k) memory. A similar result can be obtained for timestamp-based windows. 

One may argue that in practice the requirement of producing a sample for every window is too rigid. 
The relaxed version of the problem, where samples are outputted for all windows except a small fraction, 
is sufficient. Indeed, any statistic that is based on sampling accepts an error with small probability. Thus, 
by cutting off the windows with large chains, we can maintains the desired statistics and only increase the 
probability of error by a negligible amount. Moreover, if the size of the entire stream is polynomial in the 
window size, i.e., N = poly(n), then the probability of error on the whole stream can be made as small as 
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l/poly(N). As a result, one may claim that the improvement provided in our paper is incremental, from the 
practical perspective. This argument is flawed for two important reasons. 

First, this argument misses the important point that for a constant number of samples, our algorithms 
are asymptotically superior then the previous algorithms. Even for the relaxed version of the problem, S 3 
algorithms are strictly superior to the previous solutions. In fact, we need at most two samples per window 
for sequence -based windows and at most 3 log n samples for timestamp-based windows plus 4 log 2 n bits. 
In contrast, the previous solutions accept memory fluctuations. Restricting these solutions to the cases where 
memory bounds are close to ours creates a bias toward recent elements, resulting in non-uniform sampling. 
To illustrate this point, let p^ be a conditional probability of the N — i-th element to be chosen, given that 
the size of the chain is bounded by 2. Then we can show that po/p n > 2. The bias for priority sampling 
is smaller; nevertheless, it is computationally distinguishable for polynomially bounded streams. Thus, S 3 
sampling either strictly improves the memory usage or eliminates the non-uniformity of samples (or both). 
Both properties are of the great importance for practical applications, as has been pointed out in (5T). 

Second, our results are new in the following sense. As we mentioned above, deletions introduce addi- 
tional difficulties: samples expire and the size of the domain is unknown. Efficient algorithms for various 
streaming models that support deletions exist, such as dynamic sampling [33], inverse sampling [22 ] and 
biased sampling (21 . However, all of these results accept a small probability of failure, either in terms of 
distribution or in terms of memory guarantees. S 3 algorithms are the first schemas that support probabil- 
ity of error. We are able to generate random events with probability -, even without actually knowing the 
precise value of n. Thus, our technique is of independent interest and can possibly be used in other models 
that support deletions. 

Probably the most important impact of our results is the ability to translate any streaming algorithm 
that is based on uniform sampling to sliding windows, while preserving worst-case memory guarantees. To 
emphasize this point, we present in Section 6 a "sample" of such algorithms. The questions asked there have 
natural extensions to sliding windows. Translation of these algorithms to sliding windows is straightforward: 
we replace the underlaying sampling algorithm with S 3 . In particular, we address the following problems: 
frequency moments, counting triangles in graphs, entropy estimation, and density estimation. We believe 
that this is only a small subset of problems that can be addressed using S 3 , and thus it may become a 
powerful tool in the sliding windows model. 

1.2 A new sampling method - high-level ideas 

For the sequence-based window, we divide the stream into buckets with a size the same as the window. For 
each of them we maintain a sample. At any time, the window intersects up to two buckets, say B\, B2. For 
a single sample the algorithm is simple: if the sample from B\ is active, choose it; otherwise choose the 
sample from B2. Since the number of expired elements in B\ is equal to the number of arrived elements in 
£>2, the uniform distribution is preserved. To create a fc-sample with replacement, we repeat the procedure 
k times. We can generalize the idea to a fc-sample without replacement. We generate k samples without 
replacement in every bucket, using the reservoir algorithm, and combine them as follows. If i samples are 
expired in B\, take the k — i active samples from B\ and i samples from B2. Simple analysis shows that the 
distribution is uniform. 

For the S 3 algorithm with replacement from the timestamp-based window, we maintain a list of buckets, 
(^-decomposition. The last bucket, B, may contain both expired and active elements but is smaller then the 
union of other buckets. For each bucket we maintain a 0(l)-memory structure that contains independent 
samples from the bucket and other statistics. We can combine bucket samples with corresponding proba- 
bilities to generate samples of their union. However, we cannot easily combine last bucket's samples, since 
the number of active elements, n, is unknown. To overcome this problem, we exploit the fact that a random 
sample of B chooses a (fixed) active element p w.p. r^r > ~, that can be reduced to ~ by generating an 

independent event w.p. ^ . We prove that it is possible to generate such event without knowing n. 

Finally, we show that a fc-sample without replacement may be generated from k independent samples, 
Ro, . . . , Rk-i, when Ri samples all but i last active elements. Such samples can be generated if, in addition, 
we store last k elements. 

Our algorithms generate independent samples for non-overlapping windows. The independency follows 
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from the nice property of the reservoir algorithm (that we use to generate samples in the buckets). Let Ri be 
a sample generated for the bucket B, upon arrival of i elements of B. Let R2 be a fraction of the final sample 
(i.e., the sample when the last element of B arrives) that belongs to the last \B\ — i elements. The reservoir 
algorithm implies that Ri and R2 are independent. Since the rest of the buckets contain independent samples 
as well, we conclude that S 3 is independent for non-overlapping windows. 

1.3 Related work 

Substantial work has been done in the streaming model including (among many many others) the following 
papers. A frequency moments problem was introduced and studied by Alon, Matias and Szegedy Q, 
and then by Bhuvanagiri, Ganguly, Kesh and Saha lfT3l , Bar-Yossef, Jayram, Kumar and Sivakumar iflQl , 
Chakrabarti, Khot and Sun [ 18], Coppersmith and Kumar [23], Ganguly 0411 . and Indyk and Woodruff fiTl . 
Graph algorithms were studied by Bar-Yosseff, Kumar and Sivakumar ifTTI . Buriol, Frahling, Leonardi, 
Marchetti-Spaccamela and Sohler [15], Feigenbaum, Kannan, McGregor, Suri and Zhang |[30l l3~TTl , and 
Jowhari and Ghodsi |43l . Entropy approximation was researched by Chakrabarti, Cormode and McGregor 
|fT6l , Chakrabarti, Do Ba and Muthukrishnan [17], Guha, McGregor and Venkatasubramanian [39], and Lall, 
Sekar, Ogihara, Xu and Zhang [44]. Clustering problems were studied by Aggarwal, Han, Wang, and Yu 
0, Guha, Meyerson, Mishra, Motwani and O'Callaghan [40], and Palmer and Faloutsos [50]. The problem 
of estimating the number of distinct elements was addressed by Bar-Yossef, Jayram, Kumar, Sivakumar and 
Trevisan 1121 . Cormode, Datar, Indyk, and Muthukrishnan lETI and Ganguly ||35l . 

Datar, Gionis, Indyk and Motwani 1241 pioneered the research in this area, presenting exponential his- 
tograms, effective and simple solutions for a wide class of functions over sliding windows. In particular, 
they gave a memory-optimal algorithm for count, sum, average, L p ,p S [1, 2] and other functions. Gibbons 
and Tirthapura 1371 improved the results for sum and count, providing memory and time-optimal algo- 
rithms. Feigenbaum, Kannan and Zhang [29 ] addressed the problem of computing diameter. Lee and Ting 
in ||46l gave a memory-optimal solution for the relaxed version of the count problem. Chi, Wang, Yu and 
Muntz EOl addressed a problem of frequent itemsets. Algorithms for frequency counts and quantiles were 
proposed by Arasu and Manku JQ. Further improvement for counts was reported by Lee and Ting P31 . 
Babcock, Datar, Motwani and O'Callaghan [9] provided an effective solution of variance and /c-medians 
problems. Algorithms for rarity and similarity were proposed by Datar and Muthukrishnan j2"5l . Golab, De- 
Haan, Demaine, Lopez-Ortiz and Munro [38] provided an effective algorithm for finding frequent elements. 
Detailed surveys of recent results can be found in |49"l [T1l. 

1.4 Roadmap and notations 

We use the following notations throughout our paper. We denote by D a stream and by pi , i > its i-th 
element. For < x < y we define [x, y] = {i,x < i < y}. Finally, bucket B(x, y) is the set of all stream 
elements between p x and p y -\. B(x, y) = {pi,i <E [x, y — 1]}. 

Sections 2 and 3 present S 3 algorithms for sequence-based windows, with and without replacement. 
Section 4 and 5 are devoted to S 3 algorithms for timestamp-based windows, with and without replacement. 
Section 6 outlines possible applications for our approach. Due to the lack of space, some proofs are omitted 
from the main body of the paper, but they can all be found in the appendix. 

2 S 3 Algorithm With Replacement for Sequence-Based Windows 

Let n be the predefined size of a window. We say that a bucket is active if all its elements have arrived and 
at least one element is non-expired. We say that a bucket is partial if not all of its elements have arrived. We 
show below how to create a single random sample. To create a A;— random sample, we repeat the procedure 
k times, independently. 

We divide D into buckets B(in, (i + l)n), i = 0, 1, At any point of time, we have exactly one active 

bucket and at most one partial bucket. For every such bucket B, we independently generate a single sample, 
using the reservoir algorithm ||54l . We denote this sample by Xb- 
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Let B be a partial bucket and C C B be the set of all arrived elements. The properties of the reservoir 
algorithm imply that Xb is a random sample of C. 

Below, we construct a random sample Z of all non-expired elements. Let U be the active bucket. If 
there is no partial bucket, then U contains only all non-expired elements. Therefore, Z = Xjj is a valid 
sample. Otherwise, let V be the partial bucket. Let U e = {x : x G U, x is expired}, U a = {x : x G 
U, x is non-expired}, V a = {x : x G V, x arrived}. 

Note that \V a \ = \U e \ and let s = \V a \. Also, note that our window is U a U V a and Xy is a random 
sample of V a . The random sample Z is constructed as follows. If Xjj is not expired, we put Z = Xjj, 
otherwise Z = Xy. To prove the correctness, let p be a non-expired element. Ifp G ?7 a , then P(Z = p) = 
P(Xu=x) = ±.Ifxe K,then 

P(Z =p)= P(Xu G U e ,Xy =p)= P(Xu G U e )P(Xy =p) = -- = -. 

n s n 

Therefore, Z is a valid random sample. We need to store only samples of active or partial buckets. Since the 
number of such buckets is at most two and the reservoir algorithm requires B(l) memory, the total memory 
of our algorithm for fc-sample is Q(k). 



3 S 3 Algorithm Without Replacement for Sequence-Based Windows 

We can generalize the idea above to provide a fc-random sample without replacement. In this section k- 
sample means /c-random sampling without replacement. 

We use the same buckets B(in, (i + l)n),i = 0,1, For every such bucket B, we independently 

generate a fc-sample Xb, using the reservoir algorithm. 

Let Bbea partial bucket and C C B be the set of all arrived elements. The properties of the reservoir 
algorithm imply that either Xb = C, if \C\ < k, or Xb is /c-sample of C. In both cases, we can generate 
i-sample of C using Xb only, for any < i < min(k, |C|). 

Our algorithm is as follows. Let U be the active bucket. If there is no partial bucket, then U contains 
only all active elements. Therefore, we can put Z = Xjj. Otherwise, let V be the partial bucket. We 
define U e ,U a ,V a , s as before and construct Z as follows. If all elements of X\j are not expired Z = Xjj. 
Otherwise, let i be the number of expired elements, i = \U e fl Xjj\. As we mentioned before, we can 
generate an i-sample of V a from Xy, since i < mm(k, s). We denote this sample as X v and put 

z = (Xunu a )ux v . 

We will prove now that Z is a valid random sample. Let Q = {pj 1 , . . . , pj k } be a fixed set of k non-expired 
elements such that ji < j 2 < ... < j k - Letz = \QnV A \,so {p h , . . . ,x jk _ t } C U a and {pj k _ t+1 , . . . , x jk } C 
V a . Hi = 0, thenQ C U and 

P(Z = Q) = P(X V = Q) = ± 

Otherwise, by independency of X\j and X v 

P(Z = Q) = P(\X V n U e \ = i, {p h ,. . . , Pjk _.} c x U: x v = { Pjk _ i+1 ,. . . , Pjk }) = 

p(\Xu n u e \ = i, { Pn ,. . . , Pjk _j c Xu)P{x v = { Pjk _ i+1 , . . . , Pjk }) = || * -L = _L 

Therefore, Z is a valid random sample of non-expired elements. Note that we store only samples of 
active or partial buckets. Since the number of such buckets is at most two and the reservoir algorithm 
requires 0(k) memory, the total memory of our algorithm is 0(k). 
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4 S 3 Algorithm With Replacement for Timestamp-Based Windows 



Let n = n(t) be the number of non-expired elements. For each element p, timestamp T(p) represents the 
moment of p's entrance. For a window with (predefined) parameter to, P is active at time t if t — T(p) < to. 
We show below how to create a single random sample. To create a k— random sample, we repeat the 
procedure k times, independently. 

4.1 Notations 

A bucket structure BS(x, y) is a group {p x , x, y, T(x),R X:V , Q x ,y, r, q}, where T(x) is a timestamp of p x , 
R, Q are independent random samples from B(x, y) and r, q are indexes of the picked (for random samples) 
elements. We denote by N(t) the size of D at the moment t and by l(t) the index of the earliest active 
element. Note that N(t) < N(t + 1), l(t) < l(t + 1) and T( Pi ) < T( Pi+1 ). 

4.2 ^-decomposition 

Let a < 6 be two indexes, (^-decomposition of a bucket B(a, 6), ((a, b), is an ordered set of bucket structures 
with independent samples inductively defined below. 

C(6,6) := BS(b, 6 + 1), 

and for a < b, 

CM) := {BS(a,c),((c,b)}, 

where c = a + 2L 1 °s( b + 1 - a )J- 1 . 

Note that \((a, b) \ = 0(log (b — a)), so ((a, b) uses 0(log (b — a)) memory. 
Given pb+i, we inductively define an operator Incr(((a, b)) as follows. 

Incr(((b, b)) := (BS{b, b + 1), BS(b + 1,6 + 2)). 

For a < b, we put 

Incr(((a,b)) := (BS(a,v), Incr(((v,b))) , 

where v is defined below. 

If [log(6 + 2 — a) J = Ll°g(k + 1 — a )J tnen we P ut v = c, where BS(a, c) is the first bucket structure 
of ((a, b). Otherwise, we put v = d, where BS(c, d) is the second bucket structure of ((a, b). (Note that 
C(a, b) contains at least two buckets for a < b.) 

We show how to construct BS(a, d) from BS(a, c) and BS(c, d). We have in this case [log(6+2— a)\ = 

Llog(6 + 1 - a) J + 1, and therefore b + 1 - a = 2* - 1 for some i > 2. Thus c - a = 2 L lo § I 2 *" 1 )] - 1 = 2*~ 2 
and 

Llog(6 + 1 - c)J = Uog (6 + 1 - a - (c - o))J = [log (2 4 - 2*- 2 - l) J = i - 1. 

Thus d - c = 2L 1 °s( b+1 - c )J- 1 = 2^ 2 = c - a. Now we can create BS(a,v) by unifying BS(a,c) 
md BS(c,d): BS(a,v) = {p a ,d - a, R ajd ,Q a , d ,r' ,q'}. We put i? M = i? ajC with probability | and 
^a,d = Rc,d otherwise. Since d — c = c — a, and i? c ^, i? a ,c are distributed uniformly, we conclude that 
R a ^d is distributed uniformly as well. Q a ^ is defined similarly and r', g' are indexes of the chosen samples. 
Finally, the new samples are independent of the rest of £'s samples. Note also that Incr(((a, b)) requires 
0(log (b — a)) operations. 

Lemma 4.1. For any a and b, Incr(((a, b)) = ((a, 6+1). 

Lemma 4.2. For any t with a positive number of active elements, we are able to maintain one of the 
following: 

1. C(l(t),N(t)), 
or 

2. BS(y t ,z t ),((z t ,N(t)), 

where y t < l(t) < Zt, z t — yt < N (t) + 1 — z% and all random samples are independent. 
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4.3 Sample generation 



We use the following notations for this section. Let B\ = B(a, b) and B2 = B(b, N(t) + 1) be two buckets 
such that p a is expired, p is active and \B\\ < \B 2 \. Let BSi and BS2 be corresponding bucket structures, 
with independent random samples R\, Q\ and R2, Q2- We put a = b — a and = N(t) + 1 — 6. Let 7 be 
the (unknown) number of non-expired elements inside B\, so n = (5 + 7. We stress that a, j3 are known and 
7 is unknown. 

Lemma 4.3. It is possible to generate a random sample Y = Y(Qi) of B\, with the following distribution: 

P{Y = Pb - i] = WTWTi^Ty 0<i<a ' 

P(Y= Pa ) ' 



(3 + a-l 

Y is independent of R\, R2, Q2 and can be generated within constant memory and time, using Q\. 

Lemma 4.4. It is possible to generate a zero-one random variable X such that P{X = 1) = -0^- X is 
independent of R±,R2, Q2 and can be generated using constant time and memory. 

Lemma 4.5. It is possible to construct a random sample V of all non-expired elements using only the data 
ofBSi, BS2 and constant time and memory. 



4.4 Main results 

Theorem 4.6. We can maintain a random sample over all non-expired elements using ©(log n) memory. 

Proof. By using lemma l4~2l we are able to maintain one of two cases. If case 1 occurs, we can combine 
random variables of all bucket structures with appropriate probabilities and get a random sample of all non- 
expired elements. If case 2 occurs, we use notations of Section 14.31 interpret the first bucket as B\ and 
combine buckets of (-decomposition to generate samples from B2. Properties of the second case imply 
I -Si I < I-S2I an d therefore, by using lemma R31 we are able to produce a random sample as well. All 
procedures, described in the lemmas require ©(log n) memory. Therefore, the theorem is correct. □ 

Lemma 4.7. The memory usage of maintaining a random sample within a timestamp -based window has a 
lower bound Q(log(n)). 



5 S 3 Algorithm Without Replacement for Timestamp-Based Windows 

Informally, the idea is as follows. We maintain k independent random samples Rq, . . . , Rk-i of active 
elements, using the algorithm from Section 4. The difference between these samples and the /c-sample with 
replacement is that Ri samples all active elements except the last i. This can be done using 0(k + k log n) 
memory. Finally, fc-sample without replacement can be generated using R\ , . . . , R^ only. 

Let us describe the algorithm in detail. First, we construct Ri. To do this, we maintain an auxiliary 
array with the last i elements. We repeat all procedures in Section [4] but we "delay" the last i elements. An 
element is added to (-decomposition only when more then % elements arrive after it. We prove the following 
variant of Lemma 1431 

Lemma 5.1. Let < i < k. For any t with more then i active elements, we are able to maintain one of the 
following: 



1. ((l(t),N(t)-i), 
or 
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2. BS(y t ,z t ),((z t ,N(t)-i), 

where yt < l(t) < Zt and Zf — yt < N(t) + l—i — zt and all random samples of the bucket structures 
are independent. 

The proof is presented in the appendix. The rest of the procedure remains the same. Note that we can 
use the same array for every i, and therefore we can construct Rq, . . . , Rk-i using Q(k + k log n) memory. 

In the reminder of this section, we show how Rq, . . . , Rk-i can be used to generate a fc-sample without 
replacement. We denote by Rj a i -random sample without replacement from 

Lemma 5.2. R b a %\ can be generated using independent R b a , R b+1 samples only. 

Lemma 5.3. RZ can be generated using only independent samples R™, i?" -1 ,. . . , R™~ k+l . 

Proof. By using Lemma EH we can generate R,2~ k+2 using i?™~ fc+1 and R™~ k+2 . We can repeat this 
procedure and generate R™~ k+ \2 < j < k, using R^Zi + ^ 1 (that we already constructed by induction) 
and R™~ k+: ' . For j = k we have R%. □ 

By using Lemma [531 we can generate /c-sample without replacement using only Rq, . . . , Rk-i- 



6 Applications 

Consider that algorithm A is sampling-based, i.e., it operates on uniformly chosen subset of D instead of 
the whole stream. Such an algorithm can be immediately transformed to sliding windows by replacing 
the underlying sampling method with S 3 . We obtain the following general result and illustrate it with the 
examples below. 

Corollary 6.1. For the sampling-based algorithm A that solves problem P, there exists an algorithm A' that 
solves P on sliding windows. The memory guarantees are preserved for sequence-based windows and have 
a multiplicative overhead of log nfor timestamp-based windows. 

Frequency moment is a fundamental problem in data stream processing. Given a stream of elements, 
such that pj E [m], the frequency of each i G [m] is defined as \j\pj = i\ and the fc-fh frequency moment 
is defined as = Y^ILi x \- The first algorithm for frequency moments for k > 2 was proposed in the 

seminal paper of Alon, Matias and Szegedy [5 ]. They present an algorithm that uses memory. 
Numerous improvements to lower and upper bounds have been reported, including the works of Bar-Yossef, 
Jayram, Kumar and Sivakumar [10], Chakrabarti, Khot and Sun [18], Coppersmith and Kumar lf23l . and 
Ganguly O- Finally, Indyk and Woodruff [41] and later Bhuvanagiri, Ganguly, Kesh and Saha [13] pre- 
sented algorithms that use memory and are optimal. The algorithm of Alon, Matias and Szegedy 
[HI is sampling-based, thus we can adapt it to sliding windows using S 3 . The memory usage is not opti- 
mal, however this is the first algorithm for frequency moments over sliding windows that works for all k. 
Recently Braverman and Ostrovsky [14] adapted the algorithm from lfT3l to sliding windows, producing a 

memory-optimal algorithm that uses 0(m 1_ fe). However, it involves k k multiplicative overhead, making it 
infeasible for large k; thus these results are generally cannot be compared. We have 

Corollary 6.2. For any k > 2, there exists an algorithm that maintains an approximation of the k-th 
frequency moment over sliding windows using 0(m 1_ fe ) bits. 

Recently, numerous graph problems were addressed in the streaming environment. Stream elements 
represent edges of the graph, given in arbitrary order. (We refer readers to [15] for a detailed explanation 
of the model). One of the fundamental graph problems is estimating a number of small cliques in a graph, 
in particular the number of triangles. Effective solutions were proposed by Jowhari and Ghodsi [43], Bar- 
Yosseff, Kumar and Sivakumar [11] and Buriol, Frahling, Leonardi, Marchetti-Spaccamela and Sohler lfl5l . 
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The last paper presented an (e, (^-approximation algorithm that uses 0(1 + lo ^^ ^ |Ti|+2|T2|+3|T 3 2^ 

memory ( |fl5l . Theorem 2) that is the best result so far. Here, |Tj| represents the number of node-triplets 
having i edges in the induced sub-graph. The algorithm is applied on a random sample collected using the 
reservoir method. By replacing the reservoir sampling with 5 3 , we obtain the following result. 

Corollary 6.3. There exists an algorithm that maintains an (e, 5)-approximation of the number of triangles 
over sliding windows. For sequence-based windows it uses 0(1 + lo ^,^^ 4^ l ri l+ 2 | ^|+3|T 3 | 2-j memor y^ 
where Eyv is the set of active edges. Timestamp-based windows adds a multiplicative factor of log n. 

Following [15 ], our method is also applicable for incidence streams, where all edges of the same vertex 
come together. 

The entropy of a stream is defined as H = — YaLi ff l°g If > where Xi is as above. The entropy norm is 
defined as Fh = Y^T=i x i 1°S x «- Effective solutions for entropy and entropy norm estimations were recently 
reported by Guha, McGregor and Venkatasubramanian [39], Chakrabarti, Do Ba and Muthukrishnan [17], 
Lall, Sekar, Ogihara, Xu and Zhang fi4l and Chakrabarti, Cormode and McGregor ifToll . The last paper 
presented an algorithm that is based on a variation of reservoir sampling. The algorithm maintains entropy 
using 0(e~ 2 log <5 _1 ) that is nearly optimal. The authors also considered the sliding window model and used 
a variant of priority sampling O to obtain the approximation. Thus, the worst-case memory guarantees are 
not preserved for sliding windows. By replacing priority sampling with S 3 we obtain 

Corollary 6.4. There exists an algorithm that maintains an (e, 8) -approximation of entropy on sliding win- 
dows using 0(e -2 log 8" 1 log n) memory. 

Moreover, S 3 can be used with the algorithm from |fF71 to obtain 0(1) memory for large values of the 
entropy norm. This algorithm is based on reservoir sampling and thus can be straightforwardly implemented 
in sliding windows. As a result, we build the first solutions with provable memory guarantees on sliding 
windows. 

S 3 algorithms can be naturally extended to some biased functions. Biased sampling [2] is non-uniform, 
giving larger probabilities for more recent elements. The distribution is defined by a biased function. We can 
apply S 3 to implement step biased functions, maintaining S 3 over each window with different lengths and 
combining the samples with corresponding probabilities. Our algorithm can extend the ideas of Feigenbaum, 
Kannan, Strauss and Viswanathan [32] for testing and spot-checking to sliding windows. Finally, we can 
apply S 3 to the algorithm of Procopiuc and Procopiuc for density estimation 11521 , since it is based on the 
reservoir algorithm as well. 
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APPENDIX 

Lemma I4TT1 For any a and b, Incr (((a, 6)) = ((a, b + 1). 

Proof. We prove the lemma by induction on b — a. If a = b then, since 6+1 = 6+ 2L lo s(( {, + 1 )+ 1_{, )J~ 1 , we 
have, by definition of £(6, 6 + 1), 

C(6, 6 + 1) = {BS(b, b + 1), C(6 + 1, 6 + 1)) = {BS{b, b + 1), BS(b + 1, 6 + 2)} = Incr(((b, b)). 

We assume that the lemma is correct for b — a < h and prove it for 6 — a = h. Let BS(a, v) be the first 
bucket of Incr(C t (a, 6)). Let BS(a, c) be the first bucket of £(a, 6). By definition, if [log(6 + 2 — a) J = 
[log(6 + 1 — o)J then v = c. We have 

v = c = a + 2 Llog(fe+l-a)J-l = a + 2 LlogO+2-a)J-l^ 

Otherwise, let BS(c, d) be the second bucket of £(a, 6). We have from above [k>g(6 + 2 — o)J = [log(6 + 
1 — a) J +1, d — c = c — a and v = d. Thus 

v = d = 2c -a = 2(a + 2 Lios(M-i-«)j-i^ _ a = a + 2 Li°s (H-i-a)j = Q + 2 Liog(6+2-a)j-i_ 

In both cases v = a + 2L lo s C(6+i)+i-o)J-l and ; by definition of ( 

C(o,6 + l) = (B5(a,t;),C(t; > 6 + l)>. 
By induction, since 6 — v < h, we have Tncr (((v,b)) = ((v,b + 1). Thus 

C(a, 6 + 1) = (55(a, v), ((v, 6 + 1)) = <5S(a, u), Incr 6))} = Incr(((a, 6)). 

□ 

Lemma 14.21 For any t with a positive number of active elements, we are able to maintain one of the 
following: 

1. ((l(t),N(t)), 
or 

2. BS(y t ,z t ),((z t ,N(t)), 

where yt < l(t) < zt, Zt—yt< N(t) + 1 — Zt and all random samples are independent. 

Proof. We prove the lemma by induction on t. First we assume that t = 0. If no element arrives at time 0, 
the stream is empty and we do nothing. Otherwise, we put £(0, 0) = BS(0, 1), and for any i, < i < N(0) 
we generate £(0, i) by executing lncr(((0, i — 1)). Therefore, at the end of this step, we have £(0, N(0)) = 
((1(0), N(0)). So, the case (1) is true. 

We assume that the lemma is correct for t and prove it for t + 1. 

1. If for t the window is empty, then the procedure is the same as for the basic case. 

2. If for t we maintain case (1), then we have three sub-cases. 

(a) If pu t \ is not expired at the moment t + 1, then l(t + 1) = l(t). Similar to the basic case, we 
apply Incr procedure for every new element with index i,N(t) < i < N(t + 1). Due to the 
properties of Incr, we have at the end ((l(t + 1), N(t + 1)). Therefore case (1) is true for t + 1. 

(b) If pjv(t) is expired, then our current bucket structures represent only expired elements. We delete 
them and apply the procedure for the basic case. 
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(c) The last sub-case is the one when pjv(t) * s not expired and pm\ is expired. Let (BS\ , . . . , BSh), 
(BSi = BS(vi, v i+i)) be all buckets of C(Z(i), N(t)). Since pjvft) * s not expired, there exists 
exactly one bucket structure, -BSj, such that is expired and p Vl+1 is not expired. We can find 
it by checking all the bucket structures, since we store timestamps for p Vi s. We put 

Vt+i = Vi,Zt+i = v i+ i. 

We have by definition 

((z t+1 ,N(t)) = ((v l+1 ,N(t)) = (BS i+1 ,...,BS k ). 
Applying Incr procedure to all new elements, we construct Q{zt+i, N(t + 1)). Finally, we have: 

Zt+1 - y t+1 = Vi+1 - Vt = 2 Llog(iV( t ) + l- Vi )j-l < 

^N(t) + l-Vi) = ±(N(t) + l-y t+1 ). 

Therefore Zt+i — Ut+i < N(t) + 1 — Zt+i < N(t + 1) + 1 — z t +\. Thus, case (2) is true for 
t + 1. We discard all non-used bucket structures BS\, . . . , BSi-\. 

3. Otherwise, for t we maintain case (2). Similarly, we have three sub-cases. 

(a) If p Zt is not expired at the moment t + 1, we put y t +i = Ut, %t+i = %t- We have 

zt+i ~ Vt+i = z t -y t < N(t) + 1 - zt < N(t + 1) + 1 - zt+i. 

Again, we add the new elements using Incr procedure and we construct ((zt+i, N(t + 1)). 
Therefore case (2) is true for t + 1. 

(b) If pjvYt) is expired, we apply exactly the same procedure as for 2.b. 

(c) If p Zt is expired and p^ft) i s not expired, we apply exactly the same procedure as for 2.c. 

Therefore, the lemma is correct. □ 

Lemma 14.31 It is possible to generate a random sample Y = Y{Q\) of B\, with the following distribu- 
tion: 

P{Y = Pb ^ ] = WTWTi^Ty 0< * <a ' 

P(Y= Pa " 1 



P + a-1 

Y is independent of R\, R2, Q2 and can be generated within constant memory and time, using Q\. 
Proof Let {Hj}^ be a set of zero-one independent random variables such that 

a(3 



P(Hj = 1) 



Let D = B\ x {0, 1}" 1 and Z be the random vector with values from D, Z = (Qi, Hi, H a ~i). Let 
{^jj^j^ be a set of subsets of D: 

Aj = {(q b -i,ai, . . . ,Oi_i, l,a i+ i, . . . ,a a -x) I a j S {0, 1}, j / i}. 
Finally we define Y as follows 



Y 



i, ifZeAi, 1 < i < a, 
otherwise. 
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Since Q\ is independent of Pi, R2, Q%, Y is independent of them as well. We have 



P(Y = pb-t) = P(Z G Ai) = P{Q 1 = q b _ h Hi = 1, Hj G {0, 1} for j ± i) 

P(Qi = q b -i)P(Hi = l)P(H, G {0, 1} fori + i) = 

1 af3 (3 



P(Ql = q b -i)P(Hi = 1) 



a(J3 + i)(J3 + i-l) + + 



Also, 



a— 1 a— 1 „ 

p(y = p a ) = 1 - £ p(y = Pb ^) = i-J2 



^0? + i)03 + i-l) 



(3 (3 + a- lj P + a-l 

By definition of Ai, the value of Y is uniquely defined by Q\ and exactly one H. Therefore, the generation 
of the whole vector Z is not necessary. Instead, we can calculate Y by the following simple procedure. 
Once we know the index of Qi's value, we generate the corresponding Hi and calculate the value of Y. We 
can omit the generation of other Hs, and therefore we need constant time and memory. □ 

Lemma 14.41 It is possible to generate a zero-one random variable X such that P(X = 1) = -g^— • X 
is independent of R\,R2, Q2 an d can be generated using constant time and memory. 

Proof. Since 7 is unknown, it cannot be generated by flipping a coin; a slightly more complicated procedure 
is required. 

Let Y(Qi ) be the random variable from Lemma 1431 We have 

7 7 3 
P(Y is not expired) = V P(Y = q b A = V = 

^ G + i-1 " J+i) = P \P ~ J+t) = J+l' 

Therefore P(Y is expired) = 

Let S be a zero-one variable, independent of Pi , R2 , Q2 , ^ such that 

We put 

'l, if yis expired AND 5 = 1, 
0, otherwise. 

/3 a a 



X = 

We have 

P{X = 1) = P(Y is expired, 5=1) = P(Y is expired)P(5 = 1 



/3 + 7/3 /3 + 7 



Since Y and 5 are independent of Pi, R2, Q2, X is independent of them as well. Since we can determine 
if Y is expired within constant time, we need a constant amount of time and memory. □ 

Lemma |4.51 It is possible to construct a random sample V of all non-expired elements using only the 
data of BSi, BS2 and constant time and memory. 
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Proof. Our goal is to generate a random variable V that chooses a non-expired element w.p. . Let X be 
the random variable generated in the previous lemma. We define V as follows. 



V 



Ri, R\ is not expired AND X = 1, 
R2, otherwise. 



Let p be a non-expired element. If p e .Bi, then since X is independent of R\, we have 

P(y = p ) = P (R X =pX = l) = P(Ri = P )P(X = l) = i-^— = -L- = -. 

a p + 7 P+7 w 



Up e B 2 , then 

P(y = p) = (1 - P(R 1 is not expired)P(X = 1))P(R 2 = p) 



7 a \ 1 1 1 



a (3 + 7 J (3 /3 + 7 n 

□ 



Lemma 14.71 77ie memory usage of maintaining a random sample within the time-based window has a 
lower bound £l(log(n)). 

Proof. Let D be a stream with the following property. For timestamp i,0 < i < 2to, we have 
elements and for i > 2io, we have exactly one element per timestamp. 

For timestamp < i < to, the probability to choose p with T(p) = i at the moment to + i — lis 

2^U)—i 2 2 *° — * 2*°^ 2 i()—1 1 

> -• 



v^i+to-i 2 2i -i 9*o-*+i V* 0-1 2*"-i- 1 V t()_1 2-?' 2*°-l 2' 

Therefore, the expected number of distinct timestamps that will be picked between moments to — 1 and 
2to — 1 is at least Yli=to-l \ = ^~T~' w ^ tn a positiv 6 probability we need to keep in memory at least 
distinct elements at the moment to- The number of active elements n at this moment is at least 2*° . Therefore 
the memory usage at this moment is O(logn), with positive probability. We can conclude that log(ra) is a 
lower bound for memory usage. □ 

Lemma |5. 11 Let < i < k. For any t with more then i active elements, we are able to maintain one of 
the following: 

1. t(l(t),N(t)-i), 
or 

2. BS(y t ,z t ),((z t ,N(t)-i), 

where yt < l(t) < zt and Zf — yt < N(t) + l — i — zt and all random samples of the bucket structures 
are independent. 

Proof. The proof is the same as in lemma 4.2, except for cases 1, 2.6, 3.b. For these cases, when the current 
window is empty, we keep it empty unless more then i elements are active. We can do this using our auxiliary 
array. Also, when new elements arrive, some of them may be expired already (if we kept them in the array). 
We therefore cannot apply Incr procedure for any "new" element. Instead, we should first skip all expired 
elements and then apply Incr. The rest of the proof remains the same. □ 

Lemma Hj2j R^X\ can be generated using independent R b a , R\ +1 samples only. 
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Proof. The algorithm is as follows. 

?6+ i_ f i^U{6+l}, ifP? +1 eP* 



DO 

a+1 ~ \ P& U R\ +1 , otherwise . 

Let X = {xi, . . . , x a +i} be a set of points from [1, b + 1], such that xi < X2 < • • • < x a < x a +i- 
If Xa+i < b + 1, then we have 

a+1 

^P^ 1 = x,)P(P a = X\{ Xj }) = (a + 1) — -5- = 

j=l W la+lJ 

Otherwise, 

P«+i = X) = P (fl* = + 1}, P? +1 E *) = j-^ = 



□ 
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